Forest Rescoring: Faster Decoding with Integrated Language Models
نویسندگان
چکیده
Efficient decoding has been a fundamental problem in machine translation, especially with an integrated language model which is essential for achieving good translation quality. We develop faster approaches for this problem based on k-best parsing algorithms and demonstrate their effectiveness on both phrase-based and syntax-based MT systems. In both cases, our methods achieve significant speed improvements, often by more than a factor of ten, over the conventional beam-search method at the same levels of search error and translation accuracy.
منابع مشابه
Investigations on Phrase-based Decoding with Recurrent Neural Network Language and Translation Models
This work explores the application of recurrent neural network (RNN) language and translation models during phrasebased decoding. Due to their use of unbounded context, the decoder integration of RNNs is more challenging compared to the integration of feedforward neural models. In this paper, we apply approximations and use caching to enable RNN decoder integration, while requiring reasonable m...
متن کاملForest-based Algorithms in Natural Language Processing
FOREST-BASED ALGORITHMS IN NATURAL LANGUAGE PROCESSING Liang Huang Supervisors: Aravind K. Joshi and Kevin Knight Many problems in Natural Language Processing (NLP) involves an efficient search for the best derivation over (exponentially) many candidates. For example, a parser aims to find the best syntactic tree for a given sentence among all derivations under a grammar, and a machine translat...
متن کاملA Search in the Forest: Efficient Algorithms for Parsing and Machine Translation based on Packed Forests A DISSERTATION PROPOSAL in Computer and Information Science
Many problems in Natural Language Processing (NLP) involves an efficient search for the best derivation over (exponentially) many candidates. For example, a parser aims to find the best syntactic tree for a given sentence among all derivations under a grammar, and a machine translation (MT) decoder explores the space of all possible translations of the source-language sentence. In these cases, ...
متن کاملFuzzy class rescoring: a part-of-speech language model
Current speech recognition systems usually use word-based trigram language models. More elaborate models are applied to word lattices or N best lists in a rescoring pass following the acoustic decoding process. In this paper we consider techniques for dealing with class-based language models in the lattice rescoring framework of our JANUS large vocabulary speech recognizer. We demonstrate how t...
متن کاملDirect word graph rescoring using a* search and RNNLM
The usage of Recurrent Neural Network Language Models (RNNLMs) has allowed reaching significant improvements in Automatic Speech Recognition (ASR) tasks. However, to take advantage of their capability for considering long histories, they are usually used to rescore the N-best lists (i.e. it is in practice not possible to use them directly during acoustic trellis search). We propose in this pape...
متن کامل